Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A formal framework for linguistic annotation

Identifieur interne : 000297 ( Main/Exploration ); précédent : 000296; suivant : 000298

A formal framework for linguistic annotation

Auteurs : Steven Bird [États-Unis] ; Mark Liberman [États-Unis]

Source :

RBID : ISTEX:CE38257115F73D7CD5D71EEDCCC26FBE73C26383

Abstract

`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions – audio, video and/or physiological recordings – or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, coreference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focused on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.

Url:
DOI: 10.1016/S0167-6393(00)00068-6


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>A formal framework for linguistic annotation</title>
<author>
<name sortKey="Bird, Steven" sort="Bird, Steven" uniqKey="Bird S" first="Steven" last="Bird">Steven Bird</name>
</author>
<author>
<name sortKey="Liberman, Mark" sort="Liberman, Mark" uniqKey="Liberman M" first="Mark" last="Liberman">Mark Liberman</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:CE38257115F73D7CD5D71EEDCCC26FBE73C26383</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1016/S0167-6393(00)00068-6</idno>
<idno type="url">https://api.istex.fr/document/CE38257115F73D7CD5D71EEDCCC26FBE73C26383/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000356</idno>
<idno type="wicri:Area/Istex/Curation">000356</idno>
<idno type="wicri:Area/Istex/Checkpoint">000247</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000247</idno>
<idno type="wicri:doubleKey">0167-6393:2001:Bird S:a:formal:framework</idno>
<idno type="wicri:Area/Main/Merge">000321</idno>
<idno type="wicri:Area/Main/Curation">000297</idno>
<idno type="wicri:Area/Main/Exploration">000297</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">A formal framework for linguistic annotation</title>
<author>
<name sortKey="Bird, Steven" sort="Bird, Steven" uniqKey="Bird S" first="Steven" last="Bird">Steven Bird</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Linguistic Data Consortium, University of Pennsylvania, 3615 Market Street, Philadelphia, PA 19104-2608</wicri:regionArea>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author>
<name sortKey="Liberman, Mark" sort="Liberman, Mark" uniqKey="Liberman M" first="Mark" last="Liberman">Mark Liberman</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Linguistic Data Consortium, University of Pennsylvania, 3615 Market Street, Philadelphia, PA 19104-2608</wicri:regionArea>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Speech Communication</title>
<title level="j" type="abbrev">SPECOM</title>
<idno type="ISSN">0167-6393</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="2000">2000</date>
<biblScope unit="volume">33</biblScope>
<biblScope unit="issue">1–2</biblScope>
<biblScope unit="page" from="23">23</biblScope>
<biblScope unit="page" to="60">60</biblScope>
</imprint>
<idno type="ISSN">0167-6393</idno>
</series>
<idno type="istex">CE38257115F73D7CD5D71EEDCCC26FBE73C26383</idno>
<idno type="DOI">10.1016/S0167-6393(00)00068-6</idno>
<idno type="PII">S0167-6393(00)00068-6</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0167-6393</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions – audio, video and/or physiological recordings – or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, coreference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focused on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Pennsylvanie</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Pennsylvanie">
<name sortKey="Bird, Steven" sort="Bird, Steven" uniqKey="Bird S" first="Steven" last="Bird">Steven Bird</name>
</region>
<name sortKey="Bird, Steven" sort="Bird, Steven" uniqKey="Bird S" first="Steven" last="Bird">Steven Bird</name>
<name sortKey="Liberman, Mark" sort="Liberman, Mark" uniqKey="Liberman M" first="Mark" last="Liberman">Mark Liberman</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000297 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000297 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:CE38257115F73D7CD5D71EEDCCC26FBE73C26383
   |texte=   A formal framework for linguistic annotation
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024